This specification relates to identifying non-compositional compounds.
A non-compositional compound (“NCC”) is a phrase of two or more words where the words composing the phrase have different meanings in the compound than their conventional meanings. As a result, the meaning of an NCC cannot be derived from the meanings of the constituent words taken individually. For example, the phrases “red herring” and “hot dog” are example non-compositional compounds (“NCC's”), as the constituent words “red”, “herring”, “Hot”, and “Dog” all have a different meaning in the compound than their conventional meanings. For example, “red herring” taken together can refer to something that distracts attention from the real issue. However, taken individually, the conventional meanings of “red” (color) and “herring” (fish) have no relation to “red hearing” (distraction). The phrases are not limited to two word phrases. The idiomatic phrase “kick the bucket” is an example three-word NCC.
By contrast, a compositional compound (“CC”) is a phrase of two or more words where the words composing the phrase have the same meanings in the compound as their conventional meanings. For example, “old lady” is a compositional compound that retains the conventional meaning of the individual words in the phrase.
Additionally, a partial compositional compound (“PCC”) is a phrase where at least one word of the phrase retains its conventional meaning in the compound. The phrase “baby spinach” is an example PCC.
Identifying phrases as NCC's is useful in information retrieval. For example, when searching for documents responsive to the query “hot dog,” knowledge that the query phrase is an NCC can improve the results by discounting documents that only include “hot” or “dog” since they are likely unrelated to “hot dog”.